Floating-Point Arithmetic

Table of Contents

Docs for Reference

The problem of the Floating-Point

For example, the real number ‘.37’ cannot be represented exactly by the arithmetic series described above so, if you assign this number to a floating point, the value stored could actually be ‘0.370000004’. This can be seen easily if we write a simple program that prints a floating point value to a lot of decimal places.

// some code to print a floating point number to a lot of 
// decimal places
int main()
{
    float f = .37;
    printf("%.20f\n", f);
}

More examples:

Rounding Error

Floating-point Formats

Several different representations of real numbers have been proposed, but by far the most widely used is the floating-point representation.1Floating-point representations have a base β (which is always assumed to be even) and a precision p. If β = 2 and p = 24, then the decimal number 0.1 cannot be represented exactly, but is approximately 1.10011001100110011001101 × 2-4.

In general, a floating-point number will be represented as ± d.dd… d ×βe, where d.dd…d is called the significand and has p digits. More precisely ±d0.d1 d2 … dp-1 ×βe represents the number

± (d0+d1β-1+…+dp-1β-(p-1)e, (0< di <β)

Footnotes:

1

Examples of other representations are floating slash and signed logarithm [Matula and Kornerup 1985; Swartzlander and Alexopoulos 1975].

Author: Shi Shougang

Created: 2015-03-05 Thu 23:19

Emacs 24.3.1 (Org mode 8.2.10)

Validate